Image Quality Assessment

ABSTRACT

Systems and methods are disclosed for image signal processing. For example, methods may include receiving a first image from a first image sensor; receiving a second image from a second image sensor; stitching the first image and the second image to obtain a stitched image; identifying an image portion of the stitched image that is positioned on a stitching boundary of the stitched image; and inputting the image portion to a machine learning module to obtain a score, wherein the machine learning module has been trained using training data that included image portions labeled to reflect an absence of stitching and image portions labeled to reflect a presence of stitching, wherein the image portions labeled to reflect a presence of stitching included stitching.

CROSS REFERENCE TO RELATED APPLICATION(S)

This application is a continuation of U.S. patent application Ser. No.15/455,446, filed Mar. 10, 2017, the disclosure of which is herebyincorporated by reference in its entirety.

TECHNICAL FIELD

This disclosure relates to image quality assessment.

BACKGROUND

Image capture devices, such as cameras, may capture content as images orvideo. Light may be received and focused via a lens and may be convertedto an electronic image signal by an image sensor. The image signal maybe processed by an image signal processor (ISP) to form an image, whichmay be stored and/or encoded. In some implementations, multiple imagesor video frames from different image sensors may include spatiallyadjacent or overlapping content, which may be stitched together to forma larger image with a larger field of view. The image stitching processmay introduce distortions that depend on the objects appearing withinthe field of view of the camera and/or the relative positions andorientations of those objects.

SUMMARY

Disclosed herein are implementations of image quality assessment.

In a first aspect, the subject matter described in this specificationcan be embodied in systems that include a first image sensor configuredto capture a first image and a second image sensor configured to capturea second image. The systems include a processing apparatus that isconfigured to receive the first image from the first image sensor;receive the second image from the second image sensor; stitch the firstimage and the second image to obtain a stitched image; identify an imageportion of the stitched image that is positioned on a stitching boundaryof the stitched image; input the image portion to a machine learningmodule to obtain a score, wherein the machine learning module has beentrained using training data that included image portions labeled toreflect an absence of stitching and image portions labeled to reflect apresence of stitching, wherein the image portions labeled to reflect apresence of stitching included stitching boundaries of stitched images;select a parameter of a stitching algorithm based at least in part onthe score; stitch, using the parameter, the first image and the secondimage to obtain a composite image; and store, display, or transmit anoutput image based on the composite image.

In a second aspect, the subject matter described in this specificationcan be embodied in methods that include receiving a first image from afirst image sensor; receiving a second image from a second image sensor;stitching the first image and the second image to obtain a stitchedimage; identifying an image portion of the stitched image that ispositioned on a stitching boundary of the stitched image; and inputtingthe image portion to a machine learning module to obtain a score,wherein the machine learning module has been trained using training datathat included image portions labeled to reflect an absence of stitchingand image portions labeled to reflect a presence of stitching, whereinthe image portions labeled to reflect a presence of stitching includedstitching boundaries of stitched images.

In a third aspect, the subject matter described in this specificationcan be embodied in methods that include presenting images to humans;receiving scores for the images from the humans; training a machinelearning module with training data that includes image portions from theimages labeled with the scores for the images from the humans; andinputting an image portion from a first image to the trained machinelearning module to obtain an estimate of quality of the first image.

These and other aspects of the present disclosure are disclosed in thefollowing detailed description, the appended claims, and theaccompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure is best understood from the following detaileddescription when read in conjunction with the accompanying drawings. Itis emphasized that, according to common practice, the various featuresof the drawings are not to-scale. On the contrary, the dimensions of thevarious features are arbitrarily expanded or reduced for clarity.

FIG. 1 is a diagram of an example of an image capture system for contentcapture.

FIG. 2A is a block diagram of an example system configured for imagecapture and stitching.

FIG. 2B is a block diagram of an example system configured for imagecapture and stitching.

FIG. 3 is a flowchart of an example technique for image capture andstitching.

FIG. 4 is a flowchart of an example technique for training a machinelearning module to enable evaluation of image stitching quality.

FIG. 5 is a flowchart of an example technique for training a machinelearning module to estimate subjective image quality.

FIG. 6 is shows an example layout of a stitched image.

FIG. 7A is a block diagram of an example machine learning moduleconfigured for image quality assessment.

FIG. 7B is a block diagram of an example machine learning moduleconfigured for image quality assessment.

FIG. 8 is a flowchart of an example technique for image capture andstitching.

DETAILED DESCRIPTION

This document includes disclosure of systems, apparatus, and methods forimage quality assessment to enable enhancement of the quality of imagesgenerated by image capture systems. For example, some image capturesystems include multiple (e.g., two or six) image sensors and generatecomposite images by stitching images from two or more sensors together.Stitching may be a dynamic, data-dependent operation that may introducedistortions into the resulting composite image. For example, a slightmisalignment of pixels from two images being stitched can result indiscontinuities (e.g., lines at which color changes abruptly) in thecomposite, stitched image, which can be quite noticeable to humans andsignificantly degrade image quality. When designing and/or applyingprocesses for stitching or other image processing, it is useful to beable to consistently assess image quality to provide feedback thatenables those processes to be adjusted (e.g., in real-time during imagecapture processing or in a laboratory where image capture systems arebeing designed) to improve image quality.

Stitching is a process of combining images with overlapping fields ofview to produce a composite image (e.g., to form a panoramic image).Stitching may include aligning the pixels of two images being combinedin a region (which may be called a seam) along a boundary betweensections of a composite image that are respectively based on twodifferent input images—called a stitching boundary. For example,stitching may include applying parallax correction (e.g., binoculardisparity correction) to align pixels corresponding to objects appearingin the fields of view of multiple image sensors. For example, becausethe binocular disparity depends on the distance of an object from theimage sensors, the stitching process may be data dependent in the sensethat it utilizes image data reflecting positions of objects in thefields of view of the sensors during the capture of a particular image(e.g., a particular frame of video) to determine the mappings of pixelsfrom input images to a composite image. It may be advantageous to have aconsistent assessment of image quality available at the time a compositeimage is being captured and encoded so that parameters (e.g., the numberof dimensions considered) of a stitching process may be adjusted to bestsuit a current scene.

The quality of stitching in an image may be assessed by inputtingportions (e.g., blocks of pixels) of a stitched image from along thestitching boundary to a machine learning module that has been trained todistinguish between portions of image data from a single image sensorand portions of data that have been stitched. For example, the machinelearning module (e.g., a convolutional neural network or a supportvector machine) may be trained with two sets of data. The first set ofdata includes image portions that consist of pixels captured with asingle image sensor and are labeled with a score (e.g., 1) correspondingto non-seam image portions. The second set of training includes imageportions that include pixels based on pixels from at least two differentimages (captured with different image sensors) that have been stitchedtogether. Portions of data in the second set of training data may belabeled with a score (e.g., 0) corresponding to seam image portions.When new composite (e.g., panoramic) images are stitched, portions ofdata from along the stitching boundary may be input to the trainedmachine learning module to obtain a score reflecting a prediction (e.g.,an estimated probability) that the portion is from a stitched seam. Ascore close to the score for non-seam data may indicate high qualitystitching in the portion. A score close to the score for seam data mayindicate low quality stitching in the portion. In some implementations,scores for multiple portions from along the stitching boundary aredetermined and combined (e.g., averaged) to determine an estimate of thequality of the stitching of the composite image. For example, thisestimate of stitching quality for the image may be used as feedback todetermine whether and/or how to adjust a parameter of a stitchingalgorithm to improve the image quality of a composite image.

It may also be useful to have automatically determined image qualityscores that correlate well with subjective human assessments of imagequality. Machine learning modules may be trained with image data thathas been labeled with image quality scores provided by humans. A machinelearning module trained in this manner may be configured to take imagedata (e.g., a portion of the pixels in an image) as input and output animage quality score that correlates well with subjective human scores.For example, images may be presented to humans and image quality scoresfor those images may be received from the humans and used to label oneor more portions of image data from the respective images. For example,the labeled image data may be used to train a neural network (e.g., aconvolutional neural network).

Implementations are described in detail with reference to the drawings,which are provided as examples so as to enable those skilled in the artto practice the technology. The figures and examples are not meant tolimit the scope of the present disclosure to a single implementation orembodiment, and other implementations and embodiments are possible byway of interchange of, or combination with, some or all of the describedor illustrated elements. Wherever convenient, the same reference numberswill be used throughout the drawings to refer to same or like parts.

FIG. 1 is a diagram of an example of an image capture system 100 forcontent capture. As shown in FIG. 1, an image capture system 100 mayinclude an image capture apparatus 110, an external user interface (UI)device 120, or a combination thereof.

In some implementations, the image capture apparatus 110 may be amulti-face apparatus and may include multiple image capture devices,such as image capture devices 130, 132, 134 as shown in FIG. 1, arrangedin a structure 140, such as a cube-shaped cage as shown. Although threeimage capture devices 130, 132, 134 are shown for simplicity in FIG. 1,the image capture apparatus 110 may include any number of image capturedevices. For example, the image capture apparatus 110 shown in FIG. 1may include six cameras, which may include the three image capturedevices 130, 132, 134 shown and three cameras not shown.

In some implementations, the structure 140 may have dimensions, such asbetween 25 mm and 150 mm. For example, the length of each side of thestructure 140 may be 105 mm. The structure 140 may include a mountingport 142, which may be removably attachable to a supporting structure,such as a tripod, a photo stick, or any other camera mount (not shown).The structure 140 may be a rigid support structure, such that therelative orientation of the image capture devices 130, 132, 134 of theimage capture apparatus 110 may be maintained in relatively static orfixed alignment, except as described herein.

The image capture apparatus 110 may obtain, or capture, image content,such as images, video, or both, with a 360° field-of-view, which may bereferred to herein as panoramic or spherical content. For example, eachof the image capture devices 130, 132, 134 may include respectivelenses, for receiving and focusing light, and respective image sensorsfor converting the received and focused light to an image signal, suchas by measuring or sampling the light, and the multiple image capturedevices 130, 132, 134 may be arranged such that respective image sensorsand lenses capture a combined field-of-view characterized by a sphericalor near spherical field-of-view.

In some implementations, each of the image capture devices 130, 132, 134may have a respective field-of-view 170, 172, 174, such as afield-of-view 170, 172, 174 that 90° in a lateral dimension 180, 182,184 and includes 120° in a longitudinal dimension 190, 192, 194. In someimplementations, image capture devices 130, 132, 134 having overlappingfields-of-view 170, 172, 174, or the image sensors thereof, may beoriented at defined angles, such as at 90°, with respect to one another.In some implementations, the image sensor of the image capture device130 is directed along the X axis, the image sensor of the image capturedevice 132 is directed along the Y axis, and the image sensor of theimage capture device 134 is directed along the Z axis. The respectivefields-of-view 170, 172, 174 for adjacent image capture devices 130,132, 134 may be oriented to allow overlap for a stitching function. Forexample, the longitudinal dimension 190 of the field-of-view 170 for theimage capture device 130 may be oriented at 90° with respect to thelatitudinal dimension 184 of the field-of-view 174 for the image capturedevice 134, the latitudinal dimension 180 of the field-of-view 170 forthe image capture device 130 may be oriented at 90° with respect to thelongitudinal dimension 192 of the field-of-view 172 for the imagecapture device 132, and the latitudinal dimension 182 of thefield-of-view 172 for the image capture device 132 may be oriented at90° with respect to the longitudinal dimension 194 of the field-of-view174 for the image capture device 134.

The image capture apparatus 110 shown in FIG. 1 may have 420° angularcoverage in vertical and/or horizontal planes by the successive overlapof 90°, 120°, 90°, 120° respective fields-of-view 170, 172, 174 (not allshown) for four adjacent image capture devices 130, 132, 134 (not allshown). For example, fields-of-view 170, 172 for the image capturedevices 130, 132 and fields-of-view (not shown) for two image capturedevices (not shown) opposite the image capture devices 130, 132respectively may be combined to provide 420° angular coverage in ahorizontal plane. In some implementations, the overlap betweenfields-of-view of image capture devices 130, 132, 134 having a combinedfield-of-view including less than 360° angular coverage in a verticaland/or horizontal plane may be aligned and merged or combined to producea panoramic image. For example, the image capture apparatus 110 may bein motion, such as rotating, and source images captured by at least oneof the image capture devices 130, 132, 134 may be combined to form apanoramic image. As another example, the image capture apparatus 110 maybe stationary, and source images captured contemporaneously by eachimage capture device 130, 132, 134 may be combined to form a panoramicimage.

In some implementations, an image capture device 130, 132, 134 mayinclude a lens 150, 152, 154 or other optical element. An opticalelement may include one or more lens, macro lens, zoom lens,special-purpose lens, telephoto lens, prime lens, achromatic lens,apochromatic lens, process lens, wide-angle lens, ultra-wide-angle lens,fisheye lens, infrared lens, ultraviolet lens, perspective control lens,other lens, and/or other optical element. In some implementations, alens 150, 152, 154 may be a fisheye lens and produce fisheye, ornear-fisheye, field-of-view images. For example, the respective lenses150, 152, 154 of the image capture devices 130, 132, 134 may be fisheyelenses. In some implementations, images captured by two or more imagecapture devices 130, 132, 134 of the image capture apparatus 110 may becombined by stitching or merging fisheye projections of the capturedimages to produce an equirectangular planar image. For example, a firstfisheye image may be a round or elliptical image, and may be transformedto a first rectangular image, a second fisheye image may be a round orelliptical image, and may be transformed to a second rectangular image,and the first and second rectangular images may be arrangedside-by-side, which may include overlapping, and stitched together toform the equirectangular planar image.

Although not expressly shown in FIG. 1, In some implementations, animage capture device 130, 132, 134 may include one or more imagesensors, such as a charge-coupled device (CCD) sensor, an active pixelsensor (APS), a complementary metal-oxide semiconductor (CMOS) sensor,an N-type metal-oxide-semiconductor (NMOS) sensor, and/or any otherimage sensor or combination of image sensors.

Although not expressly shown in FIG. 1, In some implementations, animage capture apparatus 110 may include one or more microphones, whichmay receive, capture, and record audio information, which may beassociated with images acquired by the image sensors.

Although not expressly shown in FIG. 1, the image capture apparatus 110may include one or more other information sources or sensors, such as aninertial measurement unit (IMU), a global positioning system (GPS)receiver component, a pressure sensor, a temperature sensor, a heartrate sensor, or any other unit, or combination of units, that may beincluded in an image capture apparatus.

In some implementations, the image capture apparatus 110 may interfacewith or communicate with an external device, such as the external userinterface (UI) device 120, via a wired (not shown) or wireless (asshown) computing communication link 160. Although a single computingcommunication link 160 is shown in FIG. 1 for simplicity, any number ofcomputing communication links may be used. Although the computingcommunication link 160 shown in FIG. 1 is shown as a direct computingcommunication link, an indirect computing communication link, such as alink including another device or a network, such as the internet, may beused. In some implementations, the computing communication link 160 maybe a Wi-Fi link, an infrared link, a Bluetooth (BT) link, a cellularlink, a ZigBee link, a near field communications (NFC) link, such as anISO/IEC 23243 protocol link, an Advanced Network Technologyinteroperability (ANT+) link, and/or any other wireless communicationslink or combination of links. In some implementations, the computingcommunication link 160 may be an HDMI link, a USB link, a digital videointerface link, a display port interface link, such as a VideoElectronics Standards Association (VESA) digital display interface link,an Ethernet link, a Thunderbolt link, and/or other wired computingcommunication link.

In some implementations, the user interface device 120 may be acomputing device, such as a smartphone, a tablet computer, a phablet, asmart watch, a portable computer, and/or another device or combinationof devices configured to receive user input, communicate informationwith the image capture apparatus 110 via the computing communicationlink 160, or receive user input and communicate information with theimage capture apparatus 110 via the computing communication link 160.

In some implementations, the image capture apparatus 110 may transmitimages, such as panoramic images, or portions thereof, to the userinterface device 120 via the computing communication link 160, and theuser interface device 120 may store, process, display, or a combinationthereof the panoramic images.

In some implementations, the user interface device 120 may display, orotherwise present, content, such as images or video, acquired by theimage capture apparatus 110. For example, a display of the userinterface device 120 may be a viewport into the three-dimensional spacerepresented by the panoramic images or video captured or created by theimage capture apparatus 110.

In some implementations, the user interface device 120 may communicateinformation, such as metadata, to the image capture apparatus 110. Forexample, the user interface device 120 may send orientation informationof the user interface device 120 with respect to a defined coordinatesystem to the image capture apparatus 110, such that the image captureapparatus 110 may determine an orientation of the user interface device120 relative to the image capture apparatus 110. Based on the determinedorientation, the image capture apparatus 110 may identify a portion ofthe panoramic images or video captured by the image capture apparatus110 for the image capture apparatus 110 to send to the user interfacedevice 120 for presentation as the viewport. In some implementations,based on the determined orientation, the image capture apparatus 110 maydetermine the location of the user interface device 120 and/or thedimensions for viewing of a portion of the panoramic images or video.

In an example, a user may rotate (sweep) the user interface device 120through an arc or path 122 in space, as indicated by the arrow shown at122 in FIG. 1. The user interface device 120 may communicate displayorientation information to the image capture apparatus 110 using acommunication interface such as the computing communication link 160.The image capture apparatus 110 may provide an encoded bitstream toenable viewing of a portion of the panoramic content corresponding to aportion of the environment of the display location as the image captureapparatus 110 traverses the path 122. Accordingly, display orientationinformation from the user interface device 120 may be transmitted to theimage capture apparatus 110 to control user selectable viewing ofcaptured images and/or video.

In some implementations, the image capture apparatus 110 may communicatewith one or more other external devices (not shown) via wired orwireless computing communication links (not shown).

In some implementations, data, such as image data, audio data, and/orother data, obtained by the image capture apparatus 110 may beincorporated into a combined multimedia stream. For example, themultimedia stream may include a video track and/or an audio track. Asanother example, information from various metadata sensors and/orsources within and/or coupled to the image capture apparatus 110 may beprocessed to produce a metadata track associated with the video and/oraudio track. The metadata track may include metadata, such as whitebalance metadata, image sensor gain metadata, sensor temperaturemetadata, exposure time metadata, lens aperture metadata, bracketingconfiguration metadata and/or other parameters. In some implementations,a multiplexed stream may be generated to incorporate a video and/oraudio track and one or more metadata tracks.

In some implementations, the user interface device 120 may implement orexecute one or more applications, such as GoPro Studio, GoPro App, orboth, to manage or control the image capture apparatus 110. For example,the user interface device 120 may include an application for controllingcamera configuration, video acquisition, video display, or any otherconfigurable or controllable aspect of the image capture apparatus 110.

In some implementations, the user interface device 120, such as via anapplication (e.g., GoPro App), may generate and share, such as via acloud-based or social media service, one or more images, or short videoclips, such as in response to user input.

In some implementations, the user interface device 120, such as via anapplication (e.g., GoPro App), may remotely control the image captureapparatus 110, such as in response to user input.

In some implementations, the user interface device 120, such as via anapplication (e.g., GoPro App), may display unprocessed or minimallyprocessed images or video captured by the image capture apparatus 110contemporaneously with capturing the images or video by the imagecapture apparatus 110, such as for shot framing, which may be referredto herein as a live preview, and which may be performed in response touser input.

In some implementations, the user interface device 120, such as via anapplication (e.g., GoPro App), may mark one or more key momentscontemporaneously with capturing the images or video by the imagecapture apparatus 110, such as with a HiLight Tag, such as in responseto user input.

In some implementations, the user interface device 120, such as via anapplication (e.g., GoPro App), may display, or otherwise present, marksor tags associated with images or video, such as HiLight Tags, such asin response to user input. For example, marks may be presented in aGoPro Camera Roll application for location review and/or playback ofvideo highlights.

In some implementations, the user interface device 120, such as via anapplication (e.g., GoPro App), may wirelessly control camera software,hardware, or both. For example, the user interface device 120 mayinclude a web-based graphical interface accessible by a user forselecting a live or previously recorded video stream from the imagecapture apparatus 110 for display on the user interface device 120.

In some implementations, the user interface device 120 may receiveinformation indicating a user setting, such as an image resolutionsetting (e.g., 3840 pixels by 2160 pixels), a frame rate setting (e.g.,60 frames per second (fps)), a location setting, and/or a contextsetting, which may indicate an activity, such as mountain biking, inresponse to user input, and may communicate the settings, or relatedinformation, to the image capture apparatus 110.

FIG. 2A is a block diagram of an example system 200 configured for imagecapture and stitching. The system 200 includes an image capture device210 (e.g., a camera or a drone) that includes a processing apparatus 212that is configured to receive a first image from the first image sensor214 and receive a second image from the second image sensor 216. Theprocessing apparatus 212 may be configured to perform image signalprocessing (e.g., filtering, stitching, and/or encoding) to generatedcomposite images based on image data from the image sensors 214 and 216.The image capture device 210 includes a communications interface 218 fortransferring images to other devices. The image capture device 210includes a user interface 220, which may allow a user to control imagecapture functions and/or view images. The image capture device 210includes a battery 222 for powering the image capture device 210. Thecomponents of the image capture device 210 may communicate with eachother via the bus 224. The system 200 may be used to implementtechniques described in this disclosure, such as the technique 300 ofFIG. 3.

The processing apparatus 212 may include one or more processors havingsingle or multiple processing cores. The processing apparatus 212 mayinclude memory, such as random access memory device (RAM), flash memory,or any other suitable type of storage device such as a non-transitorycomputer readable memory. The memory of the processing apparatus 212 mayinclude executable instructions and data that can be accessed by one ormore processors of the processing apparatus 212. For example, theprocessing apparatus 212 may include one or more DRAM modules such asdouble data rate synchronous dynamic random-access memory (DDR SDRAM).In some implementations, the processing apparatus 212 may include adigital signal processor (DSP). In some implementations, the processingapparatus 212 may include an application specific integrated circuit(ASIC). For example, the processing apparatus 212 may include a customimage signal processor.

The first image sensor 214 and the second image sensor 216 areconfigured to detect light of a certain spectrum (e.g., the visiblespectrum or the infrared spectrum) and convey information constitutingan image as electrical signals (e.g., analog or digital signals). Forexample, the image sensors 214 and 216 may include charge-coupleddevices (CCD) or active pixel sensors in complementarymetal-oxide-semiconductor (CMOS). The image sensors 214 and 216 maydetect light incident through respective lens (e.g., a fisheye lens). Insome implementations, the image sensors 214 and 216 include digital toanalog converters. In some implementations, the image sensors 214 and216 are held in a fixed orientation with respective fields of view thatoverlap.

The image capture device 210 may include a communications interface 218,which may enable communications with a personal computing device (e.g.,a smartphone, a tablet, a laptop computer, or a desktop computer). Forexample, the communications interface 218 may be used to receivecommands controlling image capture and processing in the image capturedevice 210. For example, the communications interface 218 may be used totransfer image data to a personal computing device. For example, thecommunications interface 218 may include a wired interface, such as ahigh-definition multimedia interface (HDMI), a universal serial bus(USB) interface, or a FireWire interface. For example, thecommunications interface 218 may include a wireless interface, such as aBluetooth interface, a ZigBee interface, and/or a Wi-Fi interface.

The image capture device 210 may include a user interface 220. Forexample, the user interface 220 may include an LCD display forpresenting images and/or messages to a user. For example, the userinterface 220 may include a button or switch enabling a person tomanually turn the image capture device 210 on and off. For example, theuser interface 220 may include a shutter button for snapping pictures.

The image capture device 210 may include a battery 222 that powers theimage capture device 210 and/or its peripherals. For example, thebattery 222 may be charged wirelessly or through a micro-USB interface.

FIG. 2B is a block diagram of an example system 230 configured for imagecapture and stitching. The system 230 includes an image capture device240 and a personal computing device 260 that communicate via acommunications link 250. The image capture device 240 includes a firstimage sensor 242 and a second image sensor 244 that are configured tocapture respective images. The image capture device 240 includes acommunications interface 246 configured to transfer images via thecommunication link 250 to the personal computing device 260. Thepersonal computing device 260 includes a processing apparatus 262 thatis configured to receive, using the communications interface 266, afirst image from the first image sensor, and receive a second image fromthe second image sensor 244. The processing apparatus 262 may beconfigured to perform image signal processing (e.g., filtering,stitching, and/or encoding) to generated composite images based on imagedata from the image sensors 242 and 244. The system 230 may be used toimplement techniques described in this disclosure, such as the technique300 of FIG. 3.

The first image sensor 242 and the second image sensor 244 areconfigured to detect light of a certain spectrum (e.g., the visiblespectrum or the infrared spectrum) and convey information constitutingan image as electrical signals (e.g., analog or digital signals). Forexample, the image sensors 242 and 244 may include charge-coupleddevices (CCD) or active pixel sensors in complementarymetal-oxide-semiconductor (CMOS). The image sensors 242 and 244 maydetect light incident through respective lens (e.g., a fisheye lens). Insome implementations, the image sensors 242 and 244 include digital toanalog converters. In some implementations, the image sensors 242 and244 are held in a fixed relative orientation with respective fields ofview that overlap. Image signals from the image sensors 242 and 244 maybe passed to other components of the image capture device 240 via thebus 248.

The communications link 250 may be wired communications link or awireless communications link. The communications interface 246 and thecommunications interface 266 may enable communications over thecommunications link 250. For example, the communications interface 246and the communications interface 266 may include a high-definitionmultimedia interface (HDMI), a universal serial bus (USB) interface, aFireWire interface, a Bluetooth interface, a ZigBee interface, and/or aWi-Fi interface. For example, the communications interface 246 and thecommunications interface 266 may be used to transfer image data from theimage capture device 240 to the personal computing device 260 for imagesignal processing (e.g., filtering, stitching, and/or encoding) togenerated composite images based on image data from the image sensors242 and 244.

The processing apparatus 262 may include one or more processors havingsingle or multiple processing cores. The processing apparatus 262 mayinclude memory, such as random access memory device (RAM), flash memory,or any other suitable type of storage device such as a non-transitorycomputer readable memory. The memory of the processing apparatus 262 mayinclude executable instructions and data that can be accessed by one ormore processors of the processing apparatus 262. For example, theprocessing apparatus 262 may include one or more DRAM modules such asdouble data rate synchronous dynamic random-access memory (DDR SDRAM).In some implementations, the processing apparatus 262 may include adigital signal processor (DSP). In some implementations, the processingapparatus 262 may include an application specific integrated circuit(ASIC). For example, the processing apparatus 262 may include a customimage signal processor. The processing apparatus 262 may exchange data(e.g., image data) with other components of the personal computingdevice 260 via the bus 268.

The personal computing device 260 may include a user interface 264. Forexample, the user interface 264 may include a touchscreen display forpresenting images and/or messages to a user and receiving commands froma user. For example, the user interface 264 may include a button orswitch enabling a person to manually turn the personal computing device260 on and off In some implementations, commands (e.g., start recordingvideo, stop recording video, or snap photograph) received via the userinterface 264 may be passed on to the image capture device 240 via thecommunications link 250.

FIG. 3 is a flowchart of an example technique 300 for image capture andstitching. The example technique 300 includes receiving 310 imagesignals from two or more image sensors; stitching 320 the receivedimages to obtain a composite image; identifying 330 one or more imageportions that are positioned along a stitching boundary of the compositeimage; inputting 340 the image portion(s) to a machine learning moduleto obtain one or more scores; determining, based on the score(s),whether 345 to re-stitch; selecting 350 one or more parameters of astitching algorithm based at least in part on the score(s); stitching(at operation 360), using the parameter(s), the received images toobtain a composite image; and storing, displaying, and/or transmittingat operation 370 transmit an output image based on the composite image.For example, the technique 300 may be implemented by the system 200 ofFIG. 2A or the system 230 of FIG. 2B. For example, the technique 300 maybe implemented by an image capture device, such the image capture device210 shown in FIG. 2, or an image capture apparatus, such as the imagecapture apparatus 110 shown in FIG. 1. For example, the technique 300may be implemented by a personal computing device, such as the personalcomputing device 260.

The images, including at least a first image from a first image sensorand a second image from a second image sensor, are received 310 from theimage sensors. The image sensors may be part of an image captureapparatus (e.g., the image capture apparatus 110, the image capturedevice 210, or the image capture device 240) that holds the imagesensors in a relative orientation such that the image sensors havepartially overlapping fields of view. For example, the images may bereceived 310 from the sensors via a bus (e.g., the bus 224). In someimplementations, the images may be received 310 via a communicationslink (e.g., the communications link 250). For example, the images may bereceived 310 via a wireless or wired communications interface (e.g.,Wi-Fi, Bluetooth, USB, HDMI, Wireless USB, Near Field Communication(NFC), Ethernet, a radio frequency transceiver, and/or otherinterfaces). For example, the images may be received 310 viacommunications interface 266.

The example technique 300 includes stitching 320 the first image and thesecond image to obtain a stitched image. In some implementations, morethan two images may be stitched 320 together (e.g., stitching togethersix images from the image sensors of the image capture apparatus 110 toobtain a spherical image). In some implementations, stitching 320 mayinclude applying parallax correction (e.g., binocular disparitycorrection for a pair of images) for received images with overlappingfields of view to align the pixels from the images corresponding toobjects appearing in multiple fields of view. For example, identifyingthe alignment for the images may include simultaneously optimizing thecorrespondence metrics and a smoothness criterion. For example, parallaxcorrection may be applied in one dimension (e.g., parallel to anepipolar line between two image sensors) or in two dimensions. In someimplementations, stitching 320 may include applying color correction tobetter match the pixels of the received images (e.g., to reduce colordifferences due to variations in the image sensors and respective lensand/or exposure times). In some implementations, stitching can includeblending (e.g., averaging pixel values) pixels from the images beingcombined within a region along a stitching boundary. For example,blending may smooth the transition across a stitching boundary to makedifferences less noticeable and improve image quality. For example,stitching 320 may be implemented by a processing apparatus (e.g., theprocessing apparatus 212 or the processing apparatus 262).

The example technique 300 includes identifying 330 an image portion ofthe stitched image that is positioned on a stitching boundary of thestitched image. For example, the image portion may be a block (e.g., an8×8 block) of pixels from the stitched image that includes pixels basedat least in part on pixels from the first image and includes pixelsbased at least in part on pixels from the second image. For example, theimage portion may be a block that extends the length of the seam (e.g.,an 1920×8 block for full resolution 1080p video frames) between twoimages being stitched. In some implementations, one or more additionalimage portions are identified 330 within the stitched image that occuralong the stitching boundary of the stitched image. For example, astitching seam may be segmented into an array of small image portions(e.g., 8×8 blocks of pixels). For example, image portions may beidentified as described in relation to FIG. 6. Because pixel alignmentvaries with factors such as the distances of objects appearing in bothof the overlapping fields of view, the component of the imagecoordinates perpendicular to the stitching boundary may vary slightlyalong the length of the stitching boundary. Identifying 330 the imageportions may include analyzing parallax correction results from thestitching 320 operation to determine the coordinates of the stitchingboundary in a particular region of the stitched image.

In some implementations, the stitched image is down-sampled whenextracting the image portion(s), such that the image portions have alower resolution than the full resolution stitched image. For example,the image portion may be a block of pixels from the stitched image thatincludes pixels on both sides of the stitching boundary, where the blockof pixels has a resolution less than the resolution of the first image.For example, the image portion(s) may be identified 330 by a processingapparatus (e.g., the processing apparatus 212 or the processingapparatus 262).

The example technique 300 includes inputting 340 the image portion(s) toa machine learning module (e.g., including a neural network, a supportvector machine, a decision tree, or a Bayesian network) to obtain ascore. The score may be indicative of the quality of the stitching inthe image portion from along the stitching boundary. To accomplish this,the machine learning module may be trained to recognize the presence orabsence of stitching and its associated artifacts and distortion inimage portions (e.g., blocks of pixels) from seams. For example, themachine learning module may have been trained using training data thatincluded image portions labeled to reflect an absence of stitching andimage portions labeled to reflect a presence of stitching. The imageportions labeled to reflect a presence of stitching may have includedstitching boundaries of stitched images. For example, the machinelearning module may have been trained using the technique 400 of FIG. 4.In some implementations, the machine learning module may be trainedduring a design phase of an image capture system and the resultingtrained machine learning module may be stored in memory of a processingapparatus implementing the technique 300.

In some implementations, the machine learning module includes a neuralnetwork (e.g., convolutional neural network). For example, the machinelearning module may include a neural network that receives pixel valuesfrom pixels in the image portion and outputs the score. For example, themachine learning module 710 of FIG. 7A may be employed. In someimplementations, the machine learning module may include a supportvector machine. For example, the machine learning module 760 of FIG. 7Bmay be employed.

In some implementations, one or more additional image portions are input340 to the machine learning module to obtain one or more additionalscores. For example, where a seam has been segmented into an array ofimage portions, multiple image portions from along a seam may be input340 to the machine learning module to obtain an array of scores. Ahistogram of the score and the one or more additional scores may begenerated. For example, an array of scores (e.g., from along a seam orfrom along the seams of a sequence of frames of video) may be used togenerate a histogram of the scores. A histogram may be used to assessthe quality of a stitching algorithm over a variety of scenes. In someimplementations, a composite score may be determined based on acollection of scores for individual image portions. For example, thescores for an array of image portions from a seam or set of seams in astitched image may be averaged to determine a stitching quality scorefor the stitched image as a whole. For example, scores may be averagedacross multiple images (e.g., a sequence of frames of video) todetermine a composite score relating to stitching quality.

For example, the image portion(s) may be input 340 to the machinelearning module by a processing apparatus (e.g., the processingapparatus 212 or the processing apparatus 262).

The score(s) obtained using the machine learning module may be analyzedto determine whether (at operation 345) the stitched image is goodenough or should be re-stitched using adjusted parameters for astitching algorithm. For example, the score(s) or a composite score maybe compared to a threshold to determine whether re-stitching should beperformed.

The example technique 300 includes selecting 350 a parameter based onthe score(s) obtained from the machine learning module. In someimplementations, the selected parameter specifies whether onedimensional parallax correction or two dimensional parallax correctionwill be applied to stitch the first image and the second image. Forexample, one dimensional parallax correction may be applied initially atstitching operation 320 and, where the score(s) from the machinelearning module are in a particular range (e.g., exceeding a threshold),a two dimensional parallax correction (which may have a highercomputational complexity) may be selected 350 and applied to stitch thereceived images at operation 360. In some implementations, the selectedparameter specifies a resolution at which the stitching analysis will beperformed. In some implementations, the selected parameter is a weightthat specifies the relative importance of a correspondence metric versusa smoothness criterion in cost function that is optimized as part of aparallax correction algorithm. For example, the weight may be chosen toproportional or inversely proportional to the score from the machinelearning module or a composite score based on scores from the machinelearning module. For example, a stitching parameter may be selected 350by a processing apparatus (e.g., the processing apparatus 212 or theprocessing apparatus 262).

When the stitching is finalized, the resulting composite image (e.g., apanoramic or spherical image) may be subject to additional imageprocessing (e.g., output projection mapping and/or encoding in acompressed format) to generate an output image (e.g., a still image orframe of video). In some implementations, the composite image may be thefinal output image (i.e., no further processing is needed). The outputimage may then be stored, displayed, and/or transmitted at operation370. For example, the output image may be transmitted to an externaldevice (e.g., a personal computing device) for display or storage. Forexample, the output image may be displayed in the user interface 220 orin the user interface 264. For example, the output image may betransmitted via the communications interface 218.

In some implementations, the score(s) or a composite score based on thescores from the machine learning module may be stored, displayed, and/ortransmitted at operation 380. For example, the scores may be logged totrack performance of stitching algorithms over time as they encounter adiversity of scenes. For example, the score(s) may be transmitted to anexternal device (e.g., a personal computing device) for display orstorage. For example, the score(s) may be displayed in user interface220 or in user interface 264. For example, the score(s) may betransmitted via the communications interface 218.

FIG. 4 is a flowchart of an example technique 400 for training a machinelearning module to enable evaluation of image stitching quality. Thetechnique 400 includes labeling 410 image portions detected with asingle image sensor to reflect the absence of stitching; labeling 420image portions in training data that include a stitching boundary toreflect the presence of stitching; and training 430 the machine learningmodule using the labeled training data. For example, the technique 400may be implemented by the system 200 of FIG. 2A or the system 230 ofFIG. 2B. For example, the technique 400 may be implemented by an imagecapture device, such the image capture device 210 shown in FIG. 2, or animage capture apparatus, such as the image capture apparatus 110 shownin FIG. 1. For example, the technique 400 may be implemented by apersonal computing device, such as the personal computing device 260.

The training data includes image portions detected with a single imagesensor that are labeled 410 to reflect an absence of stitching. Thetraining data also includes image portions that include a stitchingboundary and that are labeled 420 to reflect a presence of stitching. Insome implementations, the labels for this training data are binary. Forexample, the image portions without stitching may be labeled with a zeroto reflect the absence of stitching, while image portions that includestitching are labeled with a one to reflect the presence of stitching.

The image portions in the training data should match the size of imageportions that will be assessed by the machine learning module. Forexample, the image portions labeled to reflect the presence of stitchingmay be identified from images available for training in a mannerdiscussed in relation to operation 330 of FIG. 3 and/or FIG. 6. Theimage portions labeled to reflect the absence of stitching will be ofthe same size, but taken from regions of an image far from any seam orfrom images without seams.

The machine learning module (e.g., including a neural network, a supportvector machine, a decision tree, or a Bayesian network) is trained 430using the labeled training data. For example, the machine learningmodule 710 or the machine learning module 760 may be trained 430 usingthe labeled training data. The resulting trained machine learning modulemay be used to assess image quality of image portions of correspondingsize from stitched images in an image capture system (e.g., using thetechnique 300 of FIG. 3).

FIG. 5 is a flowchart of an example technique 500 for training a machinelearning module to estimate subjective image quality. The technique 500includes presenting 510 images to humans; receiving 520 scores for theimages from the humans; training 540 a machine learning module withtraining data that includes image portions from the images labeled 530with the scores for the images from the humans; inputting 550 an imageportion from a first image to the trained machine learning module toobtain an estimate of quality of the first image; and selecting 560 aparameter of an image processing algorithm based on the estimate ofquality of the first image. For example, the technique 500 may beimplemented by the system 200 of FIG. 2A or the system 230 of FIG. 2B.For example, the technique 500 may be implemented by an image capturedevice, such the image capture device 210 shown in FIG. 2, or an imagecapture apparatus, such as the image capture apparatus 110 shown inFIG. 1. For example, the technique 500 may be implemented by a personalcomputing device, such as the personal computing device 260.

In some implementations, the technique 500 is employed in a laboratoryusing a system of computing devices to gather the subjective scores fromhumans, train 540 a machine learning module, and use the trained machinelearning module to provide feedback for selecting 560 parameters ofimage processing algorithms under development.

The example technique 500 includes presenting 510 images to humans. Forexample, images may be displayed in a user interface (e.g., the userinterface 264) to a human, and the human may be prompted to input asubjective image quality score for the image in response. For example,the score may be on a scale of 1 to 10 or mapped from text descriptions(e.g., “excellent,” “good,” “fair,” “poor,” “bad”) of quality selectedby a user. In some implementations, a large number (e.g., astatistically significant number) of humans are presented with theimages to solicit their subjective scores of image quality for theimages from which a training data set will be determined. For example,the scores from many humans for a particular image may be averaged orotherwise combined to determine a subjective quality score for thatimage.

The scores received 520 from the humans may be used to label 530 imageportions from their respective images. For example, a stitching seam maybe segmented into an array of image portions (e.g., blocks of pixels)and the image portions may be labeled 530 with a subjective score forthe image in which the seam occurs.

The example technique 500 includes training 540 the machine learningmodule, wherein the training data includes image portions labeled withsubjective scores provided by humans for images from which the imageportions are taken. For example, image portions from a seam in astitched image that are labeled with the subjective score for thatstitched image may be used to train 540 a machine learning module.

The trained machine learning module may then be utilized to approximatesubjective image quality scores for new images. For example, portions ofnew images produced with an image processing algorithm under development(e.g., a noise reduction algorithm, a warp algorithm, a stitchingalgorithm, a compression algorithm, etc.) may be input 550 to thetrained machine learning module to obtain estimates of subjective imagequality. These estimates of subjective image quality may providefeedback to facilitate the selection 560 of one or more parameters ofthe image processing algorithm under consideration.

In some implementations, the trained machine learning module may then beutilized to provide scores as feedback for selecting 560 a parameter ofan image processing algorithm (e.g., a noise reduction algorithm, a warpalgorithm, a stitching algorithm, a compression algorithm, etc.) inreal-time, during the image capture process, to adapt to changingconditions of the scene being recorded.

FIG. 6 is shows an example layout 600 of a stitched image 610. Thestitched image 610 includes pixels from two images—a top image that isabove the stitching boundary 620 and a bottom image that is below thestitching boundary 620. The seam of the stitched image has beensegmented into an array of image portions, including image portions 630and 632. An enlarged view of image portion 640 is shown. Image portion640, like the other image portions in the array is, is an 8×8 block ofpixels that includes pixels on both sides of the stitching boundary 620.

The image portions (e.g., image portions 630, 632, and 640) may besampled at resolution less than the resolution of the full resolutionversion of the stitched image 610. For example, the image portions(e.g., image portions 630, 632, and 640) may be sampled at one quarteror one eighth of the full resolution.

In some implementations, operations to obtain a stitched image include ablending operation on the pixels from the top image and the bottom imagein a region near the stitching boundary. For example, the pixel valuesmay be calculated as

P_composite=b*P_bottom+(1−b)*P_top,   (Eqn. 1)

where P_composite is a pixel value in the composite image, P_bottom is acorresponding pixel value from the bottom image, P_top is acorresponding pixel value from the top image, and b is a blending ratiothat varies vertically across a blending region along the stitchingboundary. For example, b may be 1 below a bottom edge of the blendingregion, b may be at or near 0.5 for pixels right at the stitchingboundary, and b may decease to zero at a top edge of the blendingregion. Operation of blending pixels from the top image and the bottomimage (e.g., such as described by Eqn. 1) may be referred to as aweighted average. In some implementations, a blending operation may beeffectuated using one or more pixel masks. By way of an illustration, amask may include an array of values, wherein a value of 1 may be used toselect a pixel at a corresponding location from the top image; a valueof 0 may be used to select a pixel from the bottom image; and a valuebetween 0 and 1 may be used to average or blend pixels from the top andbottom images. The mask array may be configured based on dimensions of ablending region, for example, having a width (in number of pixels) equalto the length of the stitching boundary and a height (in number ofpixels) equal thickness of the blending region. In general, the blendingregion need not correspond exactly to extent of the seam or the imageportions. passed to a machine learning module. For example, thethickness of the blending region may be less than the thickness of theseam. In the example stitched image 610, the blending region coincideswith the image portions from the seam that extend from an upper boundary650 to a lower boundary 652. The height of the blending region in thisexample is the distance (in number of pixels) between the upper boundary650 and the lower boundary 652 (i.e., 8 pixels in this example).

FIG. 7A is a block diagram of an example machine learning module 710configured for image quality assessment. The machine learning module 710includes a convolutional neural network 712. The machine learning module710 takes image portions (e.g., 8×8 blocks of pixels) as input 720 andpasses the pixel values for the image portion to a first layer of theneural network 712. For example, where the input 720 is an 8×8 block ofcolor pixels, an array of 192 color values for the 64 pixels may bepassed into the neural network 712 as input. The neural network 712 thenproduces a score as output 730 in response to the input 720. This scoremay be indicative of image quality (e.g., stitching quality) of theimage potion that was passed in as input 720.

For example, an image portion that is input 720 may be a block of pixelsfrom a stitched image that includes pixels on both sides of a stitchingboundary and all pixel values from the block of pixels may be input to afirst layer of the neural network 712. In this manner, an output 730that is indicative of the stitching quality in the image portion may beobtained.

In some implementations, the image quality assessment may be applied togreyscale images (e.g., just considering a luminance component of animage) or to mosaiced color images that have not been demosaiced tointerpolate the colors, such that 8×8 block of pixels may be passed inas an array of 64 pixel values. These implementations, may reduce thecomplexity of the machine learning module and allow for training withsmaller data sets.

In some implementations, the convolutional neural network 712 includeseleven layers in the sequence: 2-D convolution layer, activation layer,2-D convolution layer, activation layer, dropout layer, flatten layer,dense layer, activation layer, dropout layer, dense layer, activationlayer. In some implementations, information about the position of astitching boundary within the image portion may be preserved by omittingpooling layers from the neural network. For example, the neural network712 may be implemented using the keras.io library with the structureprovided in appendix A.

FIG. 7B is a block diagram of an example machine learning module 760configured for image quality assessment. The machine learning module 760comprises a feature extraction submodule 762 that is configured todetermine features 780 based on an image portion passed in as input 770.In some implementations, the features 780 include high frequencycomponents of color channel signals of the image portion. For example,the feature extraction submodule 762 may apply high-pass filtering todetermine the features 780. In some implementations, the featureextraction submodule 762 is configured to apply an edge detector todetermine one or more of the features 780. The extracted features 780are then passed to a support vector machine module 764 to generatescores that are provided as output 790 of the machine learning module760.

FIG. 8 is a flowchart of an example technique 800 for image capture andstitching. The technique 800 is an example of the technique 400 thatspecifies a manner of utilizing blending in the stitching operations. Insome implementations, stitching images may include steps: (1) correctingdisparity or parallax between the input images to achieve a goodcorrespondence along the stitching boundary, and (2) blending images inthe overlap region (e.g., the seam) to mask imperfections of thedisparity/parallax correction.

For example, blending may consist of taking a weighted average of theinput images, where the weight smoothly varies with the distance to thestitching boundary. For example, the weight may be a linear function ofthe distance to the stitching boundary, so that at a certain distance(e.g., 20 pixels) inside a first input image, the weight is 1 for thefirst input image and 0 for second input image; at or near the stitchingboundary weights may be 50% for both input images; and at a certaindistance (e.g., 20 pixels) inside the second input image, the weight is0 for the first input image and 1 for the second input image.

Although it is preferable to have a good disparity correction, when sucha good correction is not found by the disparity correction step, theblending step mitigates the degradation of image quality by preventingthe resulting image from showing an abrupt content change, which can bevery well detected by a human eye, but rather a fuzzy area, which whilenoticeable and undesirable, can be much more acceptable to the humaneye. However, blending also can make stitching boundaries significantlymore difficult to detect for a machine learning module used toevaluating the quality of the stitching, making a stitching qualityassessment system less sensitive to disparity correction errors.

One example solution is to present to the machine learning module anon-blended stitched image, where content of the input images areabruptly stitched together, so that it is easier for the machinelearning module to evaluate the quality of the disparity or parallaxcorrection. This way, parameters or algorithm details of the disparitycorrection module can be more easily adjusted. In some implementations,the non-blended stitched image is computed for the machine learningmodule, while the final output image shown to the user is smoothlyblended stitched image. This selective use of blending may facilitateadjustment of disparity or parallax correction parameters whilesmoothing over disparity correction errors and other distortions in acomposite image that may be output for viewing.

The example technique 800 includes receiving 810 image signals from twoor more image sensors; stitching 820 without blending the receivedimages to obtain a composite image; identifying 830 one or more imageportions that are positioned along a stitching boundary of the compositeimage; inputting 840 the image portion(s) to a machine learning moduleto obtain one or more scores; determining, based on the score(s),whether 845 to re-stitch; selecting 850 one or more parameters of astitching algorithm based at least in part on the score(s); stitching(at operation 860), using the parameter(s) and with blending, thereceived images to obtain a composite image; and storing, displaying,and/or transmitting at operation 870 transmit an output image based onthe composite image. For example, the technique 800 may be implementedby the system 200 of FIG. 2A or the system 230 of FIG. 2B. For example,the technique 800 may be implemented by an image capture device, suchthe image capture device 210 shown in FIG. 2, or an image captureapparatus, such as the image capture apparatus 110 shown in FIG. 1. Forexample, the technique 800 may be implemented by a personal computingdevice, such as the personal computing device 260.

The images, including at least a first image from a first image sensorand a second image from a second image sensor, are received 810 from theimage sensors. The images may be received 810 as described in relationto operation 310 of FIG. 3.

The example technique 800 includes stitching 820 without blending thefirst image and the second image to obtain a stitched image. Forexample, stitching 820 to obtain the stitched image is performed withoutblending, such that individual pixels of the stitched image arerespectively based on either the first image or the second image, butnot both. In some implementations, more than two images may be stitched820 together (e.g., stitching together six images from the image sensorsof the image capture apparatus 110 to obtain a spherical image). In someimplementations, stitching 820 may include applying parallax correction(e.g., binocular disparity correction for a pair of images) for receivedimages with overlapping fields of view to align the pixels from theimages corresponding to objects appearing in multiple fields of view.For example, identifying the alignment for the images may includesimultaneously optimizing the correspondence metrics and a smoothnesscriterion. For example, parallax correction may be applied in onedimension (e.g., parallel to an epipolar line between two image sensors)or in two dimensions. In some implementations, stitching 820 may includeapplying color correction to better match the pixels of the receivedimages (e.g., to reduce color differences due to variations in the imagesensors and respective lens and/or exposure times). For example,stitching 820 may be implemented by a processing apparatus (e.g., theprocessing apparatus 212 or the processing apparatus 262).

The example technique 800 includes identifying 830 an image portion ofthe stitched image that is positioned on a stitching boundary of thestitched image. For example, the image portion may be identified 830 asdescribed in relation to operation 330 of FIG. 3.

The example technique 800 includes inputting 840 the image portion(s) toa machine learning module (e.g., including a neural network, a supportvector machine, a decision tree, or a

Bayesian network) to obtain a score. The score may be indicative of thequality of the stitching in the image portion from along the stitchingboundary. To accomplish this, the machine learning module may be trainedto recognize the presence or absence of stitching and its associatedartifacts and distortion in image portions (e.g., blocks of pixels) fromseams. For example, the machine learning module may have been trainedusing training data that included image portions labeled to reflect anabsence of stitching and image portions labeled to reflect a presence ofstitching. The image portions labeled to reflect a presence of stitchingmay have included stitching boundaries of stitched images. For example,the image portions of the training data labeled to reflect a presence ofstitching may have included stitching boundaries of stitched images thatwere stitched without blending. For example, the machine learning modulemay have been trained using the technique 400 of FIG. 4. In someimplementations, the machine learning module may be trained during adesign phase of an image capture system and the resulting trainedmachine learning module may be stored in memory of a processingapparatus implementing the technique 800.

In some implementations, the machine learning module includes a neuralnetwork (e.g., convolutional neural network). For example, the machinelearning module may include a neural network that receives pixel valuesfrom pixels in the image portion and outputs the score. For example, themachine learning module 710 of FIG. 7A may be employed. In someimplementations, the machine learning module may include a supportvector machine. For example, the machine learning module 760 of FIG. 7Bmay be employed.

In some implementations, one or more additional image portions are input840 to the machine learning module to obtain one or more additionalscores. For example, where a seam has been segmented into an array ofimage portions, multiple image portions from along a seam may be input840 to the machine learning module to obtain an array of scores. Ahistogram of the score and the one or more additional scores may begenerated. For example, an array of scores (e.g., from along a seam orfrom along the seams of a sequence of frames of video) may be used togenerate a histogram of the scores. A histogram may be used to assessthe quality of a stitching algorithm over a variety of scenes. In someimplementations, a composite score may be determined based on acollection of scores for individual image portions. For example, thescores for an array of image portions from a seam or set of seams in astitched image may be averaged to determine a stitching quality scorefor the stitched image as a whole. For example, scores may be averagedacross multiple images (e.g., a sequence of frames of video) todetermine a composite score relating to stitching quality.

For example, the image portion(s) may be input 840 to the machinelearning module by a processing apparatus (e.g., the processingapparatus 212 or the processing apparatus 262).

The score(s) obtained using the machine learning module may be analyzedto determine whether (at operation 845) the stitched image is goodenough or should be re-stitched using adjusted parameters for astitching algorithm. For example, the score(s) or a composite score maybe compared to a threshold to determine whether re-stitching should beperformed.

The example technique 800 includes selecting 850 a parameter based onthe score(s) obtained from the machine learning module. In someimplementations, the selected parameter specifies whether onedimensional parallax correction or two dimensional parallax correctionwill be applied to stitch the first image and the second image. Forexample, one dimensional parallax correction may be applied initially atstitching operation 820 and, where the score(s) from the machinelearning module are in a particular range (e.g., exceeding a threshold),a two dimensional parallax correction (which may have a highercomputational complexity) may be selected 850 and applied to stitch,with blending, the received images at operation 860. For example,stitching 860 to obtain a composite image is performed with blending,such that at least one pixel of the composite image is based on both thefirst image and the second image. In some implementations, the selectedparameter specifies a resolution at which the stitching analysis will beperformed. In some implementations, the selected parameter is a weightthat specifies the relative importance of a correspondence metric versusa smoothness criterion in cost function that is optimized as part of aparallax correction algorithm. For example, the weight may be chosen toproportional or inversely proportional to the score from the machinelearning module or a composite score based on scores from the machinelearning module. For example, a stitching parameter may be selected 850by a processing apparatus (e.g., the processing apparatus 212 or theprocessing apparatus 262).

When the stitching is finalized, the resulting composite image (e.g., apanoramic or spherical image) may be subject to additional imageprocessing (e.g., output projection mapping and/or encoding in acompressed format) to generate an output image (e.g., a still image orframe of video). In some implementations, the composite image may be thefinal output image (i.e., no further processing is needed). The outputimage may then be stored, displayed, and/or transmitted at operation 870(e.g., as described in relation to operation 370 of FIG. 3). In someimplementations, the score(s) or a composite score based on the scoresfrom the machine learning module may be stored, displayed, and/ortransmitted at operation 880 (e.g., as described in relation tooperation 380 of FIG. 3).

While the disclosure has been described in connection with certainembodiments, it is to be understood that the disclosure is not to belimited to the disclosed embodiments but, on the contrary, is intendedto cover various modifications and equivalent arrangements includedwithin the scope of the appended claims, which scope is to be accordedthe broadest interpretation so as to encompass all such modificationsand equivalent structures as is permitted under the law.

1-20. (canceled)
 21. A system comprising: a first image sensorconfigured to capture a first image; a second image sensor configured tocapture a second image; and a processing apparatus that is configuredto: receive the first image from the first image sensor; receive thesecond image from the second image sensor; stitch the first image andthe second image to obtain a stitched image; identify an image portionof the stitched image that is positioned on a stitching boundary of thestitched image; and input the image portion to a machine learning moduleto obtain a score, wherein the machine learning module has been trainedusing training data that included image portions labeled to reflect anabsence of stitching and image portions labeled to reflect a presence ofstitching, wherein the image portions labeled to reflect a presence ofstitching included stitching boundaries of stitched images.
 22. Thesystem of claim 21, in which the processing apparatus is configured to:identify one or more additional image portions within the stitched imagethat occur along the stitching boundary of the stitched image; and inputthe one or more additional image portions to the machine learning moduleto obtain one or more additional scores.
 23. The system of claim 21, inwhich the machine learning module comprises a feature extractionsubmodule that is configured to determine features based on the imageportion.
 24. The system of claim 21, in which stitching to obtain thestitched image is performed such that individual pixels of the stitchedimage are respectively based on either the first image or the secondimage, but not both.
 25. The system of claim 21, in which stitching toobtain the stitched image is performed such that individual pixels ofthe stitched image are respectively based on either the first image orthe second image, but not both; and in which the image portions of thetraining data labeled to reflect a presence of stitching includedstitching boundaries of stitched images that were stitched withoutblending.
 26. The system of claim 21, in which the image portion is ablock of pixels from the stitched image that includes pixels on bothsides of the stitching boundary, where the block of pixels has aresolution less than the resolution of the first image.
 27. The systemof claim 21, in which the machine learning module includes aconvolutional neural network.
 28. The system of claim 21, in which themachine learning module includes a neural network and the image portionis a block of pixels from the stitched image that includes pixels onboth sides of the stitching boundary and all pixel values from the blockof pixels are input to a first layer of the neural network.
 29. A methodcomprising: receiving a first image from a first image sensor; receivinga second image from a second image sensor; stitching the first image andthe second image to obtain a stitched image; identifying an imageportion of the stitched image that is positioned on a stitching boundaryof the stitched image; inputting the image portion to a machine learningmodule to obtain a score, wherein the machine learning module has beentrained using training data that included image portions labeled toreflect an absence of stitching and image portions labeled to reflect apresence of stitching, wherein the image portions labeled to reflect apresence of stitching included stitching boundaries of stitched images;and storing, displaying, or transmitting the score or a composite scorebased in part on the score.
 30. The method of claim 29, comprising:identifying one or more additional image portions within the stitchedimage that occur along the stitching boundary of the stitched image;inputting the one or more additional image portions to the machinelearning module to obtain one or more additional scores; and generatinga histogram of the score and the one or more additional scores.
 31. Themethod of claim 29, comprising: training the machine learning module,wherein the training data includes image portions detected with a singleimage sensor that are labeled to reflect an absence of stitching. 32.The method of claim 29, comprising: training the machine learningmodule, wherein the training data includes image portions labeled withsubjective scores provided by humans for images from which the imageportions are taken.
 33. The method of claim 29, comprising: selecting aparameter of a stitching algorithm based on the score.
 34. The method ofclaim 29, in which the image portion is a block of pixels from thestitched image that includes pixels on both sides of the stitchingboundary, where the block of pixels has a resolution less than theresolution of the first image.
 35. The method of claim 29, in which themachine learning module includes a neural network that receives pixelvalues from pixels in the image portion and outputs the score.
 36. Themethod of claim 29, comprising: obtaining a plurality of scores from themachine learning module for a plurality of image portions from along thestitching boundary of the stitched image; and determining a compositescore for the stitched image based on the plurality of scores.
 37. Amethod comprising: labeling image portions in training data that weredetected with a single image sensor to reflect the absence of stitching;labeling image portions in training data that include a stitchingboundary to reflect the presence of stitching; and training a machinelearning module using the labeled training data, wherein the machinelearning module takes an image portion as input and outputs a score. 38.The method of claim 37, in which the machine learning module includes aneural network and each labeled image portion in the training data is ablock of pixels and all pixel values from the block of pixels are inputto a first layer of the neural network.
 39. The method of claim 37, inwhich the machine learning module includes a convolutional neuralnetwork.
 40. The method of claim 37, in which the machine learningmodule includes a feature extraction submodule and a support vectormachine.